Machine Learning 58
♻ ☆ MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes
Several studies showed that Large Language Models (LLMs) can answer medical
questions correctly, even outperforming the average human score in some medical
exams. However, to our knowledge, no study has been conducted to assess the
ability of language models to validate existing or generated medical text for
correctness and consistency. In this paper, we introduce MEDEC
(https://github.com/abachaa/MEDEC), the first publicly available benchmark for
medical error detection and correction in clinical notes, covering five types
of errors (Diagnosis, Management, Treatment, Pharmacotherapy, and Causal
Organism). MEDEC consists of 3,848 clinical texts, including 488 clinical notes
from three US hospital systems that were not previously seen by any LLM. The
dataset has been used for the MEDIQA-CORR shared task to evaluate seventeen
participating systems [Ben Abacha et al., 2024]. In this paper, we describe the
data creation methods and we evaluate recent LLMs (e.g., o1-preview, GPT-4,
Claude 3.5 Sonnet, and Gemini 2.0 Flash) for the tasks of detecting and
correcting medical errors requiring both medical knowledge and reasoning
capabilities. We also conducted a comparative study where two medical doctors
performed the same task on the MEDEC test set. The results showed that MEDEC is
a sufficiently challenging benchmark to assess the ability of models to
validate existing or generated notes and to correct medical errors. We also
found that although recent LLMs have a good performance in error detection and
correction, they are still outperformed by medical doctors in these tasks. We
discuss the potential factors behind this gap, the insights from our
experiments, the limitations of current evaluation metrics, and share potential
pointers for future research.
comment: This version has been updated with further clarification regarding
the model size estimates that were mined from public articles only and
provided to aid in contextualizing model performance. The authors cannot
vouch for the accuracy of those estimates
♻ ☆ Sparsely Multimodal Data Fusion
Multimodal data fusion is essential for applications requiring the
integration of diverse data sources, especially in the presence of incomplete
or sparsely available modalities. This paper presents a comparative study of
three multimodal embedding techniques, Modal Channel Attention (MCA), Zorro,
and Everything at Once (EAO), to evaluate their performance on sparsely
multimodal data. MCA introduces fusion embeddings for all combinations of input
modalities and uses attention masking to create distinct attention channels,
enabling flexible and efficient data fusion. Experiments on two datasets with
four modalities each, CMU-MOSEI and TCGA, demonstrate that MCA outperforms
Zorro across ranking, recall, regression, and classification tasks and
outperforms EAO across regression and classification tasks. MCA achieves
superior performance by maintaining robust uniformity across unimodal and
fusion embeddings. While EAO performs best in ranking metrics due to its
approach of forming fusion embeddings post-inference, it underperforms in
downstream tasks requiring multimodal interactions. These results highlight the
importance of contrasting all modality combinations in constructing embedding
spaces and offers insights into the design of multimodal architectures for
real-world applications with incomplete data.
♻ ☆ Familiarity-Based Open-Set Recognition Under Adversarial Attacks
Open-set recognition (OSR), the identification of novel categories, can be a
critical component when deploying classification models in real-world
applications. Recent work has shown that familiarity-based scoring rules such
as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are
strong baselines when the closed-set accuracy is high. However, one of the
potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we
study gradient-based adversarial attacks on familiarity scores for both types
of attacks, False Familiarity and False Novelty attacks, and evaluate their
effectiveness in informed and uninformed settings on TinyImageNet. Furthermore,
we explore how novel and familiar samples react to adversarial attacks and
formulate the adversarial reaction score as an alternative OSR scoring rule,
which shows a high correlation with the MLS familiarity score.
comment: Published in: Proceedings of the 6th Northern Lights Deep Learning
Conference (NLDL), PMLR 265, 2025
♻ ☆ Accurate RNA 3D structure prediction using a language model-based deep learning approach
Tao Shen, Zhihang Hu, Siqi Sun, Di Liu, Felix Wong, Jiuming Wang, Jiayang Chen, Yixuan Wang, Liang Hong, Jin Xiao, Liangzhen Zheng, Tejas Krishnamoorthi, Irwin King, Sheng Wang, Peng Yin, James J. Collins, Yu Li
Accurate prediction of RNA three-dimensional (3D) structure remains an
unsolved challenge. Determining RNA 3D structures is crucial for understanding
their functions and informing RNA-targeting drug development and synthetic
biology design. The structural flexibility of RNA, which leads to scarcity of
experimentally determined data, complicates computational prediction efforts.
Here, we present RhoFold+, an RNA language model-based deep learning method
that accurately predicts 3D structures of single-chain RNAs from sequences. By
integrating an RNA language model pre-trained on ~23.7 million RNA sequences
and leveraging techniques to address data scarcity, RhoFold+ offers a fully
automated end-to-end pipeline for RNA 3D structure prediction. Retrospective
evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate
RhoFold+'s superiority over existing methods, including human expert groups.
Its efficacy and generalizability are further validated through cross-family
and cross-type assessments, as well as time-censored benchmarks. Additionally,
RhoFold+ predicts RNA secondary structures and inter-helical angles, providing
empirically verifiable features that broaden its applicability to RNA structure
and function studies.
comment: 23 pages, 5 figures. A revised version is published in Nature Methods
21, 2287-2298 (2024). doi:10.1038/s41592-024-02487-0
♻ ☆ Text2Data: Low-Resource Data Generation with Textual Control AAAI
Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, Ran Xu, Huan Wang, Caiming Xiong, Silvio Savarese
Natural language serves as a common and straightforward signal for humans to
interact seamlessly with machines. Recognizing the importance of this
interface, the machine learning community is investing considerable effort in
generating data that is semantically coherent with textual instructions. While
strides have been made in text-to-data generation spanning image editing, audio
synthesis, video creation, and beyond, low-resource areas characterized by
expensive annotations or complex data structures, such as molecules, motion
dynamics, and time series, often lack textual labels. This deficiency impedes
supervised learning, thereby constraining the application of advanced
generative models for text-to-data tasks. In response to these challenges in
the low-resource scenario, we propose Text2Data, a novel approach that utilizes
unlabeled data to understand the underlying data distribution through an
unsupervised diffusion model. Subsequently, it undergoes controllable
finetuning via a novel constraint optimization-based learning objective that
ensures controllability and effectively counteracts catastrophic forgetting.
Comprehensive experiments demonstrate that Text2Data is able to achieve
enhanced performance regarding controllability across various modalities,
including molecules, motions and time series, when compared to existing
baselines.
comment: Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)
♻ ☆ Des-q: a quantum algorithm to provably speedup retraining of decision trees
Decision trees are widely adopted machine learning models due to their
simplicity and explainability. However, as training data size grows, standard
methods become increasingly slow, scaling polynomially with the number of
training examples. In this work, we introduce Des-q, a novel quantum algorithm
to construct and retrain decision trees for regression and binary
classification tasks. Assuming the data stream produces small, periodic
increments of new training examples, Des-q significantly reduces the tree
retraining time. Des-q achieves a logarithmic complexity in the combined total
number of old and new examples, even accounting for the time needed to load the
new samples into quantum-accessible memory. Our approach to grow the tree from
any given node involves performing piecewise linear splits to generate multiple
hyperplanes, thus partitioning the input feature space into distinct regions.
To determine the suitable anchor points for these splits, we develop an
efficient quantum-supervised clustering method, building upon the q-means
algorithm introduced by Kerenidis et al. We benchmark the simulated version of
Des-q against the state-of-the-art classical methods on multiple data sets and
observe that our algorithm exhibits similar performance to the state-of-the-art
decision trees while significantly speeding up the periodic tree retraining.
comment: 44 pager, 5 figures, 4 tables
♻ ☆ Task Singular Vectors: Reducing Task Interference in Model Merging
Antonio Andrea Gargiulo, Donato Crisostomi, Maria Sofia Bucarelli, Simone Scardapane, Fabrizio Silvestri, Emanuele Rodolà
Task Arithmetic has emerged as a simple yet effective method to merge models
without additional training. However, by treating entire networks as flat
parameter vectors, it overlooks key structural information and is susceptible
to task interference. In this paper, we study task vectors at the layer level,
focusing on task layer matrices and their singular value decomposition. In
particular, we concentrate on the resulting singular vectors, which we refer to
as Task Singular Vectors (TSV). Recognizing that layer task matrices are often
low-rank, we propose TSV-Compress (TSV-C), a simple procedure that compresses
them to 10% of their original size while retaining 99% of accuracy. We further
leverage this low-rank space to define a new measure of task interference based
on the interaction of singular vectors from different tasks. Building on these
findings, we introduce TSV-Merge (TSV-M), a novel model merging approach that
combines compression with interference reduction, significantly outperforming
existing methods.
comment: 20 pages, 17 figures, 6 tables; major changes of figures' style,
minor fixes, fixed typos
♻ ☆ In-Trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates
Inverse reinforcement learning (IRL) aims to learn a reward function and a
corresponding policy that best fit the demonstrated trajectories of an expert.
However, current IRL works cannot learn incrementally from an ongoing
trajectory because they have to wait to collect at least one complete
trajectory to learn. To bridge the gap, this paper considers the problem of
learning a reward function and a corresponding policy while observing the
initial state-action pair of an ongoing trajectory and keeping updating the
learned reward and policy when new state-action pairs of the ongoing trajectory
are observed. We formulate this problem as an online bi-level optimization
problem where the upper level dynamically adjusts the learned reward according
to the newly observed state-action pairs with the help of a meta-regularization
term, and the lower level learns the corresponding policy. We propose a novel
algorithm to solve this problem and guarantee that the algorithm achieves
sub-linear local regret $O(\sqrt{T}+\log T+\sqrt{T}\log T)$. If the reward
function is linear, we prove that the proposed algorithm achieves sub-linear
regret $O(\log T)$. Experiments are used to validate the proposed algorithm.
♻ ☆ Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach
A recent theory shows that a multi-player decentralized partially observable
Markov decision process can be transformed into an equivalent single-player
game, enabling the application of \citeauthor{bellman}'s principle of
optimality to solve the single-player game by breaking it down into
single-stage subgames. However, this approach entangles the decision variables
of all players at each single-stage subgame, resulting in backups with a
double-exponential complexity. This paper demonstrates how to disentangle these
decision variables while maintaining optimality under hierarchical information
sharing, a prominent management style in our society. To achieve this, we apply
the principle of optimality to solve any single-stage subgame by breaking it
down further into smaller subgames, enabling us to make single-player decisions
at a time. Our approach reveals that extensive-form games always exist with
solutions to a single-stage subgame, significantly reducing time complexity.
Our experimental results show that the algorithms leveraging these findings can
scale up to much larger multi-player games without compromising optimality.
♻ ☆ SwitchLoRA: Switched Low-Rank Adaptation Can Learn Full-Rank Information
In the training of large language models, parameter-efficient techniques such
as LoRA optimize memory usage and reduce communication overhead and memory
usage during the fine-tuning phase. However, applying such techniques directly
during the pre-training phase results in poor performance, primarily because
the premature implementation of low-rank training significantly reduces model
accuracy. Existing methods like ReLoRA and GaLore have attempted to address
this challenge by updating the low-rank subspace. However, they still fall
short of achieving the accuracy of full-rank training. Specifically, ReLoRA
restricts the frequency of updates to preserve optimizer states consistency,
hindering its ability to closely approximate full-rank training behavior.
Meanwhile, GaLore relies on Singular Value Decomposition (SVD) to approximate
the full-rank space, which introduces accuracy loss during the approximation
process. In this paper, we introduce SwitchLoRA, a parameter-efficient training
technique that frequently and smoothly replaces the trainable parameters of
LoRA adapters with alternative parameters. SwitchLoRA updates the low-rank
subspace incrementally, targeting only a few dimensions at a time to minimize
the impact on optimizer states. This allows a higher update frequency, thereby
enhancing accuracy by enabling the updated parameters to more closely mimic
full-rank behavior during the pre-training phase. Our results demonstrate that
SwitchLoRA actually surpasses full-rank training, reducing perplexity from
15.23 to 15.01 on the LLaMA 1.3B model, while also cutting communication
overhead by 54\% and memory usage by 13\%. Furthermore, after full fine-tuning
the SwitchLoRA pre-trained model and the full-rank pre-trained model on the
GLUE benchmark, the SwitchLoRA pre-trained model showed an average accuracy
gain of about 1\% over the full-rank pre-trained model.
comment: SwitchLoRA introduces an innovative parameter-efficient training
method that dynamically switches parameters throughout the entire training
period, achieving significant memory and communication overhead while
preserving accuracy
♻ ☆ Variational autoencoders with latent high-dimensional steady geometric flows for dynamics
We develop Riemannian approaches to variational autoencoders (VAEs) for
PDE-type ambient data with regularizing geometric latent dynamics, which we
refer to as VAE-DLM, or VAEs with dynamical latent manifolds. We redevelop the
VAE framework such that manifold geometries, subject to our geometric flow,
embedded in Euclidean space are learned in the intermediary latent space
developed by encoders and decoders. By tailoring the geometric flow in which
the latent space evolves, we induce latent geometric properties of our
choosing, which are reflected in empirical performance. We reformulate the
traditional evidence lower bound (ELBO) loss with a considerate choice of
prior. We develop a linear geometric flow with a steady-state regularizing
term. This flow requires only automatic differentiation of one time derivative,
and can be solved in moderately high dimensions in a physics-informed approach,
allowing more expressive latent representations. We discuss how this flow can
be formulated as a gradient flow, and maintains entropy away from metric
singularity. This, along with an eigenvalue penalization condition, helps
ensure the manifold is sufficiently large in measure, nondegenerate, and a
canonical geometry, which contribute to a robust representation. Our methods
focus on the modified multi-layer perceptron architecture with tanh activations
for the manifold encoder-decoder. We demonstrate, on our datasets of interest,
our methods perform at least as well as the traditional VAE, and oftentimes
better. Our methods can outperform this and a VAE endowed with our proposed
architecture, frequently reducing out-of-distribution (OOD) error between 15%
to 35% on select datasets. We highlight our method on ambient PDEs whose
solutions maintain minimal variation in late times. We provide empirical
justification towards how we can improve robust learning for external dynamics
with VAEs.
comment: Edits and improved tables
♻ ☆ A Closer Look at Deep Learning Methods on Tabular Datasets
Tabular data is prevalent across diverse domains in machine learning. While
classical methods like tree-based models have long been effective, Deep Neural
Network (DNN)-based methods have recently demonstrated promising performance.
However, the diverse characteristics of methods and the inherent heterogeneity
of tabular datasets make understanding and interpreting tabular methods both
challenging and prone to unstable observations. In this paper, we conduct
in-depth evaluations and comprehensive analyses of tabular methods, with a
particular focus on DNN-based models, using a benchmark of over 300 tabular
datasets spanning a wide range of task types, sizes, and domains. First, we
perform an extensive comparison of 32 state-of-the-art deep and tree-based
methods, evaluating their average performance across multiple criteria.
Although method ranks vary across datasets, we empirically find that
top-performing methods tend to concentrate within a small subset of tabular
models, regardless of the criteria used. Next, we investigate whether the
training dynamics of deep tabular models can be predicted based on dataset
properties. This approach not only offers insights into the behavior of deep
tabular methods but also identifies a core set of "meta-features" that reflect
dataset heterogeneity. The other subset includes datasets where method ranks
are consistent with the overall benchmark, acting as a reliable probe for
further tabular analysis.
♻ ☆ Stable-V2A: Synthesis of Synchronized Sound Effects with Temporal and Semantic Controls
Riccardo Fosco Gramaccioni, Christian Marinoni, Emilian Postolache, Marco Comunità, Luca Cosmo, Joshua D. Reiss, Danilo Comminiello
Sound designers and Foley artists usually sonorize a scene, such as from a
movie or video game, by manually annotating and sonorizing each action of
interest in the video. In our case, the intent is to leave full creative
control to sound designers with a tool that allows them to bypass the more
repetitive parts of their work, thus being able to focus on the creative
aspects of sound production. We achieve this presenting Stable-V2A, a two-stage
model consisting of: an RMS-Mapper that estimates an envelope representative of
the audio characteristics associated with the input video; and Stable-Foley, a
diffusion model based on Stable Audio Open that generates audio semantically
and temporally aligned with the target video. Temporal alignment is guaranteed
by the use of the envelope as a ControlNet input, while semantic alignment is
achieved through the use of sound representations chosen by the designer as
cross-attention conditioning of the diffusion process. We train and test our
model on Greatest Hits, a dataset commonly used to evaluate V2A models. In
addition, to test our model on a case study of interest, we introduce Walking
The Maps, a dataset of videos extracted from video games depicting animated
characters walking in different locations. Samples and code available on our
demo page at https://ispamm.github.io/Stable-V2A.
♻ ☆ A Survey of Controllable Learning: Methods and Applications in Information Retrieval
Controllability has become a crucial aspect of trustworthy machine learning,
enabling learners to meet predefined targets and adapt dynamically at test time
without requiring retraining as the targets shift. We provide a formal
definition of controllable learning (CL), and discuss its applications in
information retrieval (IR) where information needs are often complex and
dynamic. The survey categorizes CL according to what is controllable (e.g.,
multiple objectives, user portrait, scenario adaptation), who controls (users
or platforms), how control is implemented (e.g., rule-based method, Pareto
optimization, hypernetwork and others), and where to implement control (e.g.,
pre-processing, in-processing, post-processing methods). Then, we identify
challenges faced by CL across training, evaluation, task setting, and
deployment in online environments. Additionally, we outline promising
directions for CL in theoretical analysis, efficient computation, empowering
large language models, application scenarios and evaluation frameworks.
♻ ☆ Degeneracy is OK: Logarithmic Regret for Network Revenue Management with Indiscrete Distributions
We study the classical Network Revenue Management (NRM) problem with
accept/reject decisions and $T$ IID arrivals. We consider a distributional form
where each arrival must fall under a finite number of possible categories, each
with a deterministic resource consumption vector, but a random value
distributed continuously over an interval. We develop an online algorithm that
achieves $O(\log^2 T)$ regret under this model, with the only (necessary)
assumption being that the probability densities are bounded away from 0. We
derive a second result that achieves $O(\log T)$ regret under an additional
assumption of second-order growth. To our knowledge, these are the first
results achieving logarithmic-level regret in an NRM model with continuous
values that do not require any kind of "non-degeneracy" assumptions. Our
results are achieved via new techniques including a new method of bounding
myopic regret, a "semi-fluid" relaxation of the offline allocation, and an
improved bound on the "dual convergence".
♻ ☆ Upper Bounds for Learning in Reproducing Kernel Hilbert Spaces for Non IID Samples
In this paper, we study a Markov chain-based stochastic gradient algorithm in
general Hilbert spaces, aiming to approximate the optimal solution of a
quadratic loss function. We establish probabilistic upper bounds on its
convergence. We further extend these results to an online regularized learning
algorithm in reproducing kernel Hilbert spaces, where the samples are drawn
along a Markov chain trajectory hence the samples are of the non i.i.d. type.
♻ ☆ Amortized Bayesian Experimental Design for Decision-Making NeurIPS 2024
Many critical decisions, such as personalized medical diagnoses and product
pricing, are made based on insights gained from designing, observing, and
analyzing a series of experiments. This highlights the crucial role of
experimental design, which goes beyond merely collecting information on system
parameters as in traditional Bayesian experimental design (BED), but also plays
a key part in facilitating downstream decision-making. Most recent BED methods
use an amortized policy network to rapidly design experiments. However, the
information gathered through these methods is suboptimal for down-the-line
decision-making, as the experiments are not inherently designed with downstream
objectives in mind. In this paper, we present an amortized decision-aware BED
framework that prioritizes maximizing downstream decision utility. We introduce
a novel architecture, the Transformer Neural Decision Process (TNDP), capable
of instantly proposing the next experimental design, whilst inferring the
downstream decision, thus effectively amortizing both tasks within a unified
workflow. We demonstrate the performance of our method across several tasks,
showing that it can deliver informative designs and facilitate accurate
decision-making.
comment: 20 pages, 6 figures. Accepted at the 38th Conference on Neural
Information Processing Systems (NeurIPS 2024)
♻ ☆ λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics
Ahmed Jaafar, Shreyas Sundara Raman, Yichen Wei, Sofia Juliani, Anneke Wernerfelt, Benedict Quartey, Ifrah Idrees, Jason Xinyu Liu, Stefanie Tellex
Efficiently learning and executing long-horizon mobile manipulation (MoMa)
tasks is crucial for advancing robotics in household and workplace settings.
However, current MoMa models are data-inefficient, underscoring the need for
improved models that require realistic-sized benchmarks to evaluate their
efficiency, which do not exist. To address this, we introduce the LAMBDA
({\lambda}) benchmark (Long-horizon Actions for Mobile-manipulation
Benchmarking of Directed Activities), which evaluates the data efficiency of
models on language-conditioned, long-horizon, multi-room, multi-floor,
pick-and-place tasks using a dataset of manageable size, more feasible for
collection. The benchmark includes 571 human-collected demonstrations that
provide realism and diversity in simulated and real-world settings. Unlike
planner-generated data, these trajectories offer natural variability and
replay-verifiability, ensuring robust learning and evaluation. We benchmark
several models, including learning-based models and a neuro-symbolic modular
approach combining foundation models with task and motion planning.
Learning-based models show suboptimal success rates, even when leveraging
pretrained weights, underscoring significant data inefficiencies. However, the
neuro-symbolic approach performs significantly better while being more data
efficient. Findings highlight the need for more data-efficient learning-based
MoMa approaches. {\lambda} addresses this gap by serving as a key benchmark for
evaluating the data efficiency of those future models in handling household
robotics tasks.
comment: 8 pages
♻ ☆ SAP: Corrective Machine Unlearning with Scaled Activation Projection for Label Noise Robustness
Label corruption, where training samples are mislabeled due to non-expert
annotation or adversarial attacks, significantly degrades model performance.
Acquiring large, perfectly labeled datasets is costly, and retraining models
from scratch is computationally expensive. To address this, we introduce Scaled
Activation Projection (SAP), a novel SVD (Singular Value Decomposition)-based
corrective machine unlearning algorithm. SAP mitigates label noise by
identifying a small subset of trusted samples using cross-entropy loss and
projecting model weights onto a clean activation space estimated using SVD on
these trusted samples. This process suppresses the noise introduced in
activations due to the mislabeled samples. In our experiments, we demonstrate
SAP's effectiveness on synthetic noise with different settings and real-world
label noise. SAP applied to the CIFAR dataset with 25% synthetic corruption
show upto 6% generalization improvements. Additionally, SAP can improve the
generalization over noise robust training approaches on CIFAR dataset by ~3.2%
on average. Further, we observe generalization improvements of 2.31% for a
Vision Transformer model trained on naturally corrupted Clothing1M.
♻ ☆ Multicollinearity Resolution Based on Machine Learning: A Case Study of Carbon Emissions
This study presents a general analytical framework using DBSCAN clustering
and penalized regression models to address multifactor problems with structural
complexity and multicollinearity issues, such as carbon emission issue. The
framework leverages DBSCAN for unsupervised learning to objectively cluster
features. Meanwhile, penalized regression considers model complexity control
and high dimensional feature selection to identify dominant influencing
factors. Applying this framework to analyze energy consumption data for 46
industries from 2000 to 2019 identified 16 categories in the sample of China.
We quantitatively assessed emission characteristics and drivers for each. The
results demonstrate the framework's analytical approach can identify primary
emission sources by category, providing quantitative references for
decision-making. Overall, this framework can evaluate complex regional issues
like carbon emissions to support policymaking. This research preliminarily
validated its application value in identifying opportunities for emission
reduction worldwide.
comment: AJSEA, 11 pages,18 figures
♻ ☆ RiTTA: Modeling Event Relations in Text-to-Audio Generation
Despite significant advancements in Text-to-Audio (TTA) generation models
achieving high-fidelity audio with fine-grained context understanding, they
struggle to model the relations between audio events described in the input
text. However, previous TTA methods have not systematically explored audio
event relation modeling, nor have they proposed frameworks to enhance this
capability. In this work, we systematically study audio event relation modeling
in TTA generation models. We first establish a benchmark for this task by: 1.
proposing a comprehensive relation corpus covering all potential relations in
real-world scenarios; 2. introducing a new audio event corpus encompassing
commonly heard audios; and 3. proposing new evaluation metrics to assess audio
event relation modeling from various perspectives. Furthermore, we propose a
finetuning framework to enhance existing TTA models ability to model audio
events relation. Code is available at: https://github.com/yuhanghe01/RiTTA
comment: Project Site: https://yuhanghe01.github.io/RiTTA-Proj/. Code:
https://github.com/yuhanghe01/RiTTA
♻ ☆ Predictive Model Development to Identify Failed Healing in Patients after Non-Union Fracture Surgery
Cedric Donié, Marie K. Reumann, Tony Hartung, Benedikt J. Braun, Tina Histing, Satoshi Endo, Sandra Hirche
Bone non-union is among the most severe complications associated with trauma
surgery, occurring in 10-30% of cases after long bone fractures. Treating
non-unions requires a high level of surgical expertise and often involves
multiple revision surgeries, sometimes even leading to amputation. Thus, more
accurate prognosis is crucial for patient well-being. Recent advances in
machine learning (ML) hold promise for developing models to predict non-union
healing, even when working with smaller datasets, a commonly encountered
challenge in clinical domains. To demonstrate the effectiveness of ML in
identifying candidates at risk of failed non-union healing, we applied three ML
models (logistic regression, support vector machine, and XGBoost) to the
clinical dataset TRUFFLE, which includes 797 patients with long bone non-union.
The models provided prediction results with 70% sensitivity, and the
specificities of 66% (XGBoost), 49% (support vector machine), and 43% (logistic
regression). These findings offer valuable clinical insights because they
enable early identification of patients at risk of failed non-union healing
after the initial surgical revision treatment protocol.
♻ ☆ Tensor-Based Foundations of Ordinary Least Squares and Neural Network Regression Models
This article introduces a novel approach to the mathematical development of
Ordinary Least Squares and Neural Network regression models, diverging from
traditional methods in current Machine Learning literature. By leveraging
Tensor Analysis and fundamental matrix computations, the theoretical
foundations of both models are meticulously detailed and extended to their
complete algorithmic forms. The study culminates in the presentation of three
algorithms, including a streamlined version of the Backpropagation Algorithm
for Neural Networks, illustrating the benefits of this new mathematical
approach.
comment: 16 pages, 3 algorithms
♻ ☆ Detecting Financial Bots on the Ethereum Blockchain
The integration of bots in Distributed Ledger Technologies (DLTs) fosters
efficiency and automation. However, their use is also associated with predatory
trading and market manipulation, and can pose threats to system integrity. It
is therefore essential to understand the extent of bot deployment in DLTs;
despite this, current detection systems are predominantly rule-based and lack
flexibility. In this study, we present a novel approach that utilizes machine
learning for the detection of financial bots on the Ethereum platform. First,
we systematize existing scientific literature and collect anecdotal evidence to
establish a taxonomy for financial bots, comprising 7 categories and 24
subcategories. Next, we create a ground-truth dataset consisting of 133 human
and 137 bot addresses. Third, we employ both unsupervised and supervised
machine learning algorithms to detect bots deployed on Ethereum. The
highest-performing clustering algorithm is a Gaussian Mixture Model with an
average cluster purity of 82.6%, while the highest-performing model for binary
classification is a Random Forest with an accuracy of 83%. Our machine
learning-based detection mechanism contributes to understanding the Ethereum
ecosystem dynamics by providing additional insights into the current bot
landscape.
♻ ☆ Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents
In-context reinforcement learning (ICRL) is a frontier paradigm for solving
reinforcement learning problems in the foundation model era. While ICRL
capabilities have been demonstrated in transformers through task-specific
training, the potential of Large Language Models (LLMs) out-of-the-box remains
largely unexplored. Recent findings highlight that LLMs often face challenges
when dealing with numerical contexts, and limited attention has been paid to
evaluating their performance through preference feedback generated by the
environment. This paper is the first to investigate LLMs as in-context
decision-makers under the problem of Dueling Bandits (DB), a stateless
preference-based reinforcement learning setting that extends the classic
Multi-Armed Bandit (MAB) model by querying for preference feedback. We compare
GPT-3.5 Turbo, GPT-4, GPT-4 Turbo, Llama 3.1, and o1-Preview against nine
well-established DB algorithms. Our results reveal that our top-performing LLM,
GPT-4 Turbo, has the zero-shot relative decision-making ability to achieve
surprisingly low weak regret across all the DB environment instances by quickly
including the best arm in duels. However, an optimality gap exists between LLMs
and classic DB algorithms in terms of strong regret. LLMs struggle to converge
and consistently exploit even when explicitly prompted to do so, and are
sensitive to prompt variations. To bridge this gap, we propose an agentic flow
framework: LLM with Enhanced Algorithmic Dueling (LEAD), which integrates
off-the-shelf DB algorithms with LLM agents through fine-grained adaptive
interplay. We show that LEAD has theoretical guarantees inherited from classic
DB algorithms on both weak and strong regret. We validate its efficacy and
robustness even with noisy and adversarial prompts. The design of our framework
sheds light on how to enhance the trustworthiness of LLMs used for in-context
decision-making.
♻ ☆ Hyperparameter Importance Analysis for Multi-Objective AutoML
Hyperparameter optimization plays a pivotal role in enhancing the predictive
performance and generalization capabilities of ML models. However, in many
applications, we do not only care about predictive performance but also about
additional objectives such as inference time, memory, or energy consumption. In
such multi-objective scenarios, determining the importance of hyperparameters
poses a significant challenge due to the complex interplay between the
conflicting objectives. In this paper, we propose the first method for
assessing the importance of hyperparameters in multi-objective hyperparameter
optimization. Our approach leverages surrogate-based hyperparameter importance
measures, i.e., fANOVA and ablation paths, to provide insights into the impact
of hyperparameters on the optimization objectives. Specifically, we compute the
a-priori scalarization of the objectives and determine the importance of the
hyperparameters for different objective tradeoffs. Through extensive empirical
evaluations on diverse benchmark datasets with three different objective pairs,
each combined with accuracy, namely time, demographic parity loss, and energy
consumption, we demonstrate the effectiveness and robustness of our proposed
method. Our findings not only offer valuable guidance for hyperparameter tuning
in multi-objective optimization tasks but also contribute to advancing the
understanding of hyperparameter importance in complex optimization scenarios.
comment: Presented at the 27th European Conference on Artificial Intelligence,
19-24 October 2024, Santiago de Compostela, Spain
♻ ☆ Generative Modelling with High-Order Langevin Dynamics WACV2024
Diffusion generative modelling (DGM) based on stochastic differential
equations (SDEs) with score matching has achieved unprecedented results in data
generation. In this paper, we propose a novel fast high-quality generative
modelling method based on high-order Langevin dynamics (HOLD) with score
matching. This motive is proved by third-order Langevin dynamics. By augmenting
the previous SDEs, e.g. variance exploding or variance preserving SDEs for
single-data variable processes, HOLD can simultaneously model position,
velocity, and acceleration, thereby improving the quality and speed of the data
generation at the same time. HOLD is composed of one Ornstein-Uhlenbeck process
and two Hamiltonians, which reduce the mixing time by two orders of magnitude.
Empirical experiments for unconditional image generation on the public data set
CIFAR-10 and CelebA-HQ show that the effect is significant in both Frechet
inception distance (FID) and negative log-likelihood, and achieves the
state-of-the-art FID of 1.85 on CIFAR-10.
comment: Some of the results in this paper have been published at conferences,
such as WACV2024, ICASSP2024, and ICME2024
♻ ☆ Enhancing Preference-based Linear Bandits via Human Response Time NeurIPS 2024
Interactive preference learning systems infer human preferences by presenting
queries as pairs of options and collecting binary choices. Although binary
choices are simple and widely used, they provide limited information about
preference strength. To address this, we leverage human response times, which
are inversely related to preference strength, as an additional signal. We
propose a computationally efficient method that combines choices and response
times to estimate human utility functions, grounded in the EZ diffusion model
from psychology. Theoretical and empirical analyses show that for queries with
strong preferences, response times complement choices by providing extra
information about preference strength, leading to significantly improved
utility estimation. We incorporate this estimator into preference-based linear
bandits for fixed-budget best-arm identification. Simulations on three
real-world datasets demonstrate that using response times significantly
accelerates preference learning compared to choice-only approaches. Additional
materials, such as code, slides, and talk video, are available at
https://shenlirobot.github.io/pages/NeurIPS24.html
comment: NeurIPS 2024 (Oral) camera ready
♻ ☆ Physically Constrained Generative Adversarial Networks for Improving Precipitation Fields from Earth System Models
Precipitation results from complex processes across many scales, making its
accurate simulation in Earth system models (ESMs) challenging. Existing
post-processing methods can improve ESM simulations locally, but cannot correct
errors in modelled spatial patterns. Here we propose a framework based on
physically constrained generative adversarial networks (GANs) to improve local
distributions and spatial structure simultaneously. We apply our approach to
the computationally efficient ESM CM2Mc-LPJmL. Our method outperforms existing
ones in correcting local distributions, and leads to strongly improved spatial
patterns especially regarding the intermittency of daily precipitation.
Notably, a double-peaked Intertropical Convergence Zone, a common problem in
ESMs, is removed. Enforcing a physical constraint to preserve global
precipitation sums, the GAN can generalize to future climate scenarios unseen
during training. Feature attribution shows that the GAN identifies regions
where the ESM exhibits strong biases. Our method constitutes a general
framework for correcting ESM variables and enables realistic simulations at a
fraction of the computational costs.
♻ ☆ Fast, Scale-Adaptive, and Uncertainty-Aware Downscaling of Earth System Model Fields with Generative Machine Learning
Accurate and high-resolution Earth system model (ESM) simulations are
essential to assess the ecological and socio-economic impacts of anthropogenic
climate change, but are computationally too expensive to be run at sufficiently
high spatial resolution. Recent machine learning approaches have shown
promising results in downscaling ESM simulations, outperforming
state-of-the-art statistical approaches. However, existing methods require
computationally costly retraining for each ESM and extrapolate poorly to
climates unseen during training. We address these shortcomings by learning a
consistency model (CM) that efficiently and accurately downscales arbitrary ESM
simulations without retraining in a zero-shot manner. Our approach yields
probabilistic downscaled fields at a resolution only limited by the
observational reference data. We show that the CM outperforms state-of-the-art
diffusion models at a fraction of computational cost while maintaining high
controllability on the downscaling task. Further, our method generalizes to
climate states unseen during training without explicitly formulated physical
constraints.
♻ ☆ EC-IoU: Orienting Safety for Object Detectors via Ego-Centric Intersection-over-Union
This paper presents Ego-Centric Intersection-over-Union (EC-IoU), addressing
the limitation of the standard IoU measure in characterizing safety-related
performance for object detectors in navigating contexts. Concretely, we propose
a weighting mechanism to refine IoU, allowing it to assign a higher score to a
prediction that covers closer points of a ground-truth object from the ego
agent's perspective. The proposed EC-IoU measure can be used in typical
evaluation processes to select object detectors with better safety-related
performance for downstream tasks. It can also be integrated into common loss
functions for model fine-tuning. While geared towards safety, our experiment
with the KITTI dataset demonstrates the performance of a model trained on
EC-IoU can be better than that of a variant trained on IoU in terms of mean
Average Precision as well.
comment: 8 pages (IEEE double column format), 7 figures, 2 tables
♻ ☆ Spectral Enhancement and Pseudo-Anchor Guidance for Infrared-Visible Person Re-Identification
The development of deep learning has facilitated the application of person
re-identification (ReID) technology in intelligent security. Visible-infrared
person re-identification (VI-ReID) aims to match pedestrians across infrared
and visible modality images enabling 24-hour surveillance. Current studies
relying on unsupervised modality transformations as well as inefficient
embedding constraints to bridge the spectral differences between infrared and
visible images, however, limit their potential performance. To tackle the
limitations of the above approaches, this paper introduces a simple yet
effective Spectral Enhancement and Pseudo-anchor Guidance Network, named
SEPG-Net. Specifically, we propose a more homogeneous spectral enhancement
scheme based on frequency domain information and greyscale space, which avoids
the information loss typically caused by inefficient modality transformations.
Further, a Pseudo Anchor-guided Bidirectional Aggregation (PABA) loss is
introduced to bridge local modality discrepancies while better preserving
discriminative identity embeddings. Experimental results on two public
benchmark datasets demonstrate the superior performance of SEPG-Net against
other state-of-the-art methods. The code is available at
https://github.com/1024AILab/ReID-SEPG.
♻ ☆ Baichuan4-Finance Technical Report
Large language models (LLMs) have demonstrated strong capabilities in
language understanding, generation, and reasoning, yet their potential in
finance remains underexplored due to the complexity and specialization of
financial knowledge. In this work, we report the development of the
Baichuan4-Finance series, including a comprehensive suite of foundational
Baichuan4-Finance-Base and an aligned language model Baichuan4-Finance, which
are built upon Baichuan4-Turbo base model and tailored for finance domain.
Firstly, we have dedicated significant effort to building a detailed pipeline
for improving data quality. Moreover, in the continual pre-training phase, we
propose a novel domain self-constraint training strategy, which enables
Baichuan4-Finance-Base to acquire financial knowledge without losing general
capabilities. After Supervised Fine-tuning and Reinforcement Learning from
Human Feedback and AI Feedback, the chat model Baichuan4-Finance is able to
tackle various financial certification questions and real-world scenario
applications. We evaluate Baichuan4-Finance on many widely used general
datasets and two holistic financial benchmarks. The evaluation results show
that Baichuan4-Finance-Base surpasses almost all competitive baselines on
financial tasks by significant margins without sacrificing performance on
general LLM benchmarks. At the same time, Baichuan4-Finance demonstrates even
more impressive performance on financial application scenarios, showcasing its
potential to foster community innovation in the financial LLM field.
♻ ☆ FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system
Recently, large language models (LLMs) have achieved significant progress in
automated code generation. Despite their strong instruction-following
capabilities, these models frequently struggled to align with user intent in
coding scenarios. In particular, they were hampered by datasets that lacked
diversity and failed to address specialized tasks or edge cases. Furthermore,
challenges in supervised fine-tuning (SFT) and reinforcement learning from
human feedback (RLHF) led to failures in generating precise,
human-intent-aligned code. To tackle these challenges and improve the code
generation performance for automated programming systems, we propose
Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization
(i.e., FALCON). FALCON is structured into two hierarchical levels. From the
global level, long-term memory improves code quality by retaining and applying
learned knowledge. At the local level, short-term memory allows for the
incorporation of immediate feedback from compilers and AI systems.
Additionally, we introduce meta-reinforcement learning with feedback rewards to
solve the global-local bi-level optimization problem and enhance the model's
adaptability across diverse code generation tasks. Extensive experiments
demonstrate that our technique achieves state-of-the-art performance, leading
other reinforcement learning methods by more than 4.5 percentage points on the
MBPP benchmark and 6.1 percentage points on the Humaneval benchmark. The
open-sourced code is publicly available at https://github.com/titurte/FALCON.
comment: 20 pages, 7 figures
♻ ☆ COMET:Combined Matrix for Elucidating Targets
Haojie Wang, Zhe Zhang, Haotian Gao, Xiangying Zhang, Jingyuan Li, Zhihang Chen, Xinchong Chen, Yifei Qi, Yan Li, Renxiao Wang
Identifying the interaction targets of bioactive compounds is a foundational
element for deciphering their pharmacological effects. Target prediction
algorithms equip researchers with an effective tool to rapidly scope and
explore potential targets. Here, we introduce the COMET, a multi-technological
modular target prediction tool that provides comprehensive predictive insights,
including similar active compounds, three-dimensional predicted binding modes,
and probability scores, all within an average processing time of less than 10
minutes per task. With meticulously curated data, the COMET database
encompasses 990,944 drug-target interaction pairs and 45,035 binding pockets,
enabling predictions for 2,685 targets, which span confirmed and exploratory
therapeutic targets for human diseases. In comparative testing using datasets
from ChEMBL and BindingDB, COMET outperformed five other well-known algorithms,
offering nearly an 80% probability of accurately identifying at least one true
target within the top 15 predictions for a given compound. COMET also features
a user-friendly web server, accessible freely at
https://www.pdbbind-plus.org.cn/comet.
♻ ☆ Trajectory Representation Learning on Road Networks and Grids with Spatio-Temporal Dynamics
Trajectory representation learning is a fundamental task for applications in
fields including smart city, and urban planning, as it facilitates the
utilization of trajectory data (e.g., vehicle movements) for various downstream
applications, such as trajectory similarity computation or travel time
estimation. This is achieved by learning low-dimensional representations from
high-dimensional and raw trajectory data. However, existing methods for
trajectory representation learning either rely on grid-based or road-based
representations, which are inherently different and thus, could lose
information contained in the other modality. Moreover, these methods overlook
the dynamic nature of urban traffic, relying on static road network features
rather than time varying traffic patterns. In this paper, we propose TIGR, a
novel model designed to integrate grid and road network modalities while
incorporating spatio-temporal dynamics to learn rich, general-purpose
representations of trajectories. We evaluate TIGR on two realworld datasets and
demonstrate the effectiveness of combining both modalities by substantially
outperforming state-of-the-art methods, i.e., up to 43.22% for trajectory
similarity, up to 16.65% for travel time estimation, and up to 10.16% for
destination prediction.
♻ ☆ Improving Graph Neural Network Training Efficiency By Using Top Non-Robust Samples In The Training Set
Graph Neural Networks (GNNs) are a highly effective neural network
architecture for processing graph-structured data. Unlike traditional neural
networks that rely solely on the features of the data as input, GNNs leverage
both the graph structure, which represents the relationships between data
points, and the feature matrix of the data to optimize their feature
representation. This unique capability enables GNNs to achieve superior
performance across various tasks. However, it also makes GNNs more susceptible
to noise from both the graph structure and the data features, which can
significantly degrade their performance in common tasks such as classification
and prediction. To address this issue, this paper proposes a novel method for
constructing training sets by identifying training samples that are
particularly sensitive to noise for a given model. These samples are then used
to enhance the model's ability to handle noise-prone instances effectively.
Experimental results demonstrate that this approach can significantly improve
training efficiency.
♻ ☆ The Initial Screening Order Problem WSDM'25
We investigate the role of the initial screening order (ISO) in candidate
screening. The ISO refers to the order in which the screener searches the
candidate pool when selecting $k$ candidates. Today, it is common for the ISO
to be the product of an information access system, such as an online platform
or a database query. The ISO has been largely overlooked in the literature,
despite its impact on the optimality and fairness of the selected $k$
candidates, especially under a human screener. We define two problem
formulations describing the search behavior of the screener given an ISO: the
best-$k$, where it selects the top $k$ candidates; and the good-$k$, where it
selects the first good-enough $k$ candidates. To study the impact of the ISO,
we introduce a human-like screener and compare it to its algorithmic
counterpart, where the human-like screener is conceived to be inconsistent over
time. Our analysis, in particular, shows that the ISO, under a human-like
screener solving for the good-$k$ problem, hinders individual fairness despite
meeting group fairness, and hampers the optimality of the selected $k$
candidates. This is due to position bias, where a candidate's evaluation is
affected by its position within the ISO. We report extensive simulated
experiments exploring the parameters of the best-$k$ and good-$k$ problems for
both screeners. Our simulation framework is flexible enough to account for
multiple candidate screening tasks, being an alternative to running real-world
procedures.
comment: Forthcoming in the Eighteenth ACM International Conference on Web
Search and Data Mining (WSDM'25)
♻ ☆ Detection and classification of DDoS flooding attacks by machine learning method
This study focuses on a method for detecting and classifying distributed
denial of service (DDoS) attacks, such as SYN Flooding, ACK Flooding, HTTP
Flooding, and UDP Flooding, using neural networks. Machine learning,
particularly neural networks, is highly effective in detecting malicious
traffic. A dataset containing normal traffic and various DDoS attacks was used
to train a neural network model with a 24-106-5 architecture. The model
achieved high Accuracy (99.35%), Precision (99.32%), Recall (99.54%), and
F-score (0.99) in the classification task. All major attack types were
correctly identified. The model was also further tested in the lab using
virtual infrastructures to generate normal and DDoS traffic. The results showed
that the model can accurately classify attacks under near-real-world
conditions, demonstrating 95.05% accuracy and balanced F-score scores for all
attack types. This confirms that neural networks are an effective tool for
detecting DDoS attacks in modern information security systems.
comment: Paper Submitted to BAIT 2024 CEUR-WS, see
https://ceur-ws.org/Vol-3842/paper11.pdf
♻ ☆ Function Basis Encoding of Numerical Features in Factorization Machines
Factorization machine (FM) variants are widely used for large scale real-time
content recommendation systems, since they offer an excellent balance between
model accuracy and low computational costs for training and inference. These
systems are trained on tabular data with both numerical and categorical
columns. Incorporating numerical columns poses a challenge, and they are
typically incorporated using a scalar transformation or binning, which can be
either learned or chosen a-priori. In this work, we provide a systematic and
theoretically-justified way to incorporate numerical features into FM variants
by encoding them into a vector of function values for a set of functions of
one's choice.
We view factorization machines as approximators of segmentized functions,
namely, functions from a field's value to the real numbers, assuming the
remaining fields are assigned some given constants, which we refer to as the
segment. From this perspective, we show that our technique yields a model that
learns segmentized functions of the numerical feature spanned by the set of
functions of one's choice, namely, the spanning coefficients vary between
segments. Hence, to improve model accuracy we advocate the use of functions
known to have strong approximation power, and offer the B-Spline basis due to
its well-known approximation power, availability in software libraries, and
efficiency. Our technique preserves fast training and inference, and requires
only a small modification of the computational graph of an FM model. Therefore,
it is easy to incorporate into an existing system to improve its performance.
Finally, we back our claims with a set of experiments, including synthetic,
performance evaluation on several data-sets, and an A/B test on a real online
advertising system which shows improved performance.
comment: Published in TMLR, '2024
♻ ☆ A survey of Monte Carlo methods for noisy and costly densities with application to reinforcement learning and ABC
This survey gives an overview of Monte Carlo methodologies using surrogate
models, for dealing with densities which are intractable, costly, and/or noisy.
This type of problem can be found in numerous real-world scenarios, including
stochastic optimization and reinforcement learning, where each evaluation of a
density function may incur some computationally-expensive or even physical
(real-world activity) cost, likely to give different results each time. The
surrogate model does not incur this cost, but there are important trade-offs
and considerations involved in the choice and design of such methodologies. We
classify the different methodologies into three main classes and describe
specific instances of algorithms under a unified notation. A modular scheme
which encompasses the considered methods is also presented. A range of
application scenarios is discussed, with special attention to the
likelihood-free setting and reinforcement learning. Several numerical
comparisons are also provided.
♻ ☆ MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended Version KDD 2025
Developing effective path representations has become increasingly essential
across various fields within intelligent transportation. Although pre-trained
path representation learning models have shown improved performance, they
predominantly focus on the topological structures from single modality data,
i.e., road networks, overlooking the geometric and contextual features
associated with path-related images, e.g., remote sensing images. Similar to
human understanding, integrating information from multiple modalities can
provide a more comprehensive view, enhancing both representation accuracy and
generalization. However, variations in information granularity impede the
semantic alignment of road network-based paths (road paths) and image-based
paths (image paths), while the heterogeneity of multi-modal data poses
substantial challenges for effective fusion and utilization. In this paper, we
propose a novel Multi-modal, Multi-granularity Path Representation Learning
Framework (MM-Path), which can learn a generic path representation by
integrating modalities from both road paths and image paths. To enhance the
alignment of multi-modal data, we develop a multi-granularity alignment
strategy that systematically associates nodes, road sub-paths, and road paths
with their corresponding image patches, ensuring the synchronization of both
detailed local information and broader global contexts. To address the
heterogeneity of multi-modal data effectively, we introduce a graph-based
cross-modal residual fusion component designed to comprehensively fuse
information across different modalities and granularities. Finally, we conduct
extensive experiments on two large-scale real-world datasets under two
downstream tasks, validating the effectiveness of the proposed MM-Path. The
code is available at: https://github.com/decisionintelligence/MM-Path.
comment: This is an extended version of the paper accepted by KDD 2025
♻ ☆ ChemDFM-X: Towards Large Multimodal Model for Chemistry
Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Ziping Wan, Yansi Li, Zhongyang Dai, Xin Chen, Kai Yu
Rapid developments of AI tools are expected to offer unprecedented assistance
to the research of natural science including chemistry. However, neither
existing unimodal task-specific specialist models nor emerging general large
multimodal models (LMM) can cover the wide range of chemical data modality and
task categories. To address the real demands of chemists, a cross-modal
Chemical General Intelligence (CGI) system, which serves as a truly practical
and useful research assistant utilizing the great potential of LMMs, is in
great need. In this work, we introduce the first Cross-modal Dialogue
Foundation Model for Chemistry (ChemDFM-X). Diverse multimodal data are
generated from an initial modality by approximate calculations and
task-specific model predictions. This strategy creates sufficient chemical
training corpora, while significantly reducing excessive expense, resulting in
an instruction-tuning dataset containing 7.6M data. After instruction
finetuning, ChemDFM-X is evaluated on extensive experiments of different
chemical tasks with various data modalities. The results demonstrate the
capacity of ChemDFM-X for multimodal and inter-modal knowledge comprehension.
ChemDFM-X marks a significant milestone toward aligning all modalities in
chemistry, a step closer to CGI.
comment: 19 pages, 7 figures, 11 tables
♻ ☆ Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
The burgeoning landscape of text-to-image models, exemplified by innovations
such as Midjourney and DALLE 3, has revolutionized content creation across
diverse sectors. However, these advancements bring forth critical ethical
concerns, particularly with the misuse of open-source models to generate
content that violates societal norms. Addressing this, we introduce
Ethical-Lens, a framework designed to facilitate the value-aligned usage of
text-to-image tools without necessitating internal model revision. Ethical-Lens
ensures value alignment in text-to-image models across toxicity and bias
dimensions by refining user commands and rectifying model outputs. Systematic
evaluation metrics, combining GPT4-V, HEIM, and FairFace scores, assess
alignment capability. Our experiments reveal that Ethical-Lens enhances
alignment capabilities to levels comparable with or superior to commercial
models like DALLE 3, ensuring user-generated content adheres to ethical
standards while maintaining image quality. This study indicates the potential
of Ethical-Lens to ensure the sustainable development of open-source
text-to-image tools and their beneficial integration into society. Our code is
available at https://github.com/yuzhu-cai/Ethical-Lens.
comment: 51 pages, 15 figures, 32 tables
♻ ☆ Rethinking Performance Analysis for Configurable Software Systems: A Case Study from a Fitness Landscape Perspective ISSTA 2025
Modern software systems are often highly configurable to tailor varied
requirements from diverse stakeholders. Understanding the mapping between
configurations and the desired performance attributes plays a fundamental role
in advancing the controllability and tuning of the underlying system, yet has
long been a dark hole of knowledge due to its black-box nature. While there
have been previous efforts in performance analysis for these systems, they
analyze the configurations as isolated data points without considering their
inherent spatial relationships. This renders them incapable of interrogating
many important aspects of the configuration space like local optima. In this
work, we advocate a novel perspective to rethink performance analysis --
modeling the configuration space as a structured ``landscape''. To support this
proposition, we designed \our, an open-source, graph data mining empowered
fitness landscape analysis (FLA) framework. By applying this framework to $86$M
benchmarked configurations from $32$ running workloads of $3$ real-world
systems, we arrived at $6$ main findings, which together constitute a holistic
picture of the landscape topography, with thorough discussions about their
implications on both configuration tuning and performance modeling.
comment: 23 pages, 8 figures, accepted as a conference paper at ISSTA 2025
♻ ☆ A Competition Winning Deep Reinforcement Learning Agent in microRTS
Scripted agents have predominantly won the five previous iterations of the
IEEE microRTS ($\mu$RTS) competitions hosted at CIG and CoG. Despite Deep
Reinforcement Learning (DRL) algorithms making significant strides in real-time
strategy (RTS) games, their adoption in this primarily academic competition has
been limited due to the considerable training resources required and the
complexity inherent in creating and debugging such agents. RAISocketAI is the
first DRL agent to win the IEEE microRTS competition. In a benchmark without
performance constraints, RAISocketAI regularly defeated the two prior
competition winners. This first competition-winning DRL submission can be a
benchmark for future microRTS competitions and a starting point for future DRL
research. Iteratively fine-tuning the base policy and transfer learning to
specific maps were critical to RAISocketAI's winning performance. These
strategies can be used to economically train future DRL agents. Further work in
Imitation Learning using Behavior Cloning and fine-tuning these models with DRL
has proven promising as an efficient way to bootstrap models with demonstrated,
competitive behaviors.
comment: Best paper award nominee at IEEE Conference on Games 2024. 19 pages,
6 figures. Source code at https://github.com/sgoodfriend/rl-algo-impls
♻ ☆ Approximation Rate of the Transformer Architecture for Sequence Modeling
The Transformer architecture is widely applied in sequence modeling
applications, yet the theoretical understanding of its working principles
remains limited. In this work, we investigate the approximation rate for
single-layer Transformers with one head. We consider a class of non-linear
relationships and identify a novel notion of complexity measures to establish
an explicit Jackson-type approximation rate estimate for the Transformer. This
rate reveals the structural properties of the Transformer and suggests the
types of sequential relationships it is best suited for approximating. In
particular, the results on approximation rates enable us to concretely analyze
the differences between the Transformer and classical sequence modeling
methods, such as recurrent neural networks.
♻ ☆ FairGP: A Scalable and Fair Graph Transformer Using Graph Partitioning AAAI 2025
Recent studies have highlighted significant fairness issues in Graph
Transformer (GT) models, particularly against subgroups defined by sensitive
features. Additionally, GTs are computationally intensive and memory-demanding,
limiting their application to large-scale graphs. Our experiments demonstrate
that graph partitioning can enhance the fairness of GT models while reducing
computational complexity. To understand this improvement, we conducted a
theoretical investigation into the root causes of fairness issues in GT models.
We found that the sensitive features of higher-order nodes disproportionately
influence lower-order nodes, resulting in sensitive feature bias. We propose
Fairness-aware scalable GT based on Graph Partitioning (FairGP), which
partitions the graph to minimize the negative impact of higher-order nodes. By
optimizing attention mechanisms, FairGP mitigates the bias introduced by global
attention, thereby enhancing fairness. Extensive empirical evaluations on six
real-world datasets validate the superior performance of FairGP in achieving
fairness compared to state-of-the-art methods. The codes are available at
https://github.com/LuoRenqiang/FairGP.
comment: 11 pages, 2 figures, Accepted at AAAI 2025
♻ ☆ Non-Homophilic Graph Pre-Training and Prompt Learning KDD 2025
Graphs are ubiquitous for modeling complex relationships between objects
across various fields. Graph neural networks (GNNs) have become a mainstream
technique for graph-based applications, but their performance heavily relies on
abundant labeled data. To reduce labeling requirement, pre-training and prompt
learning has become a popular alternative. However, most existing prompt
methods do not differentiate homophilic and heterophilic characteristics of
real-world graphs. In particular, many real-world graphs are non-homophilic,
not strictly or uniformly homophilic with mixing homophilic and heterophilic
patterns, exhibiting varying non-homophilic characteristics across graphs and
nodes. In this paper, we propose ProNoG, a novel pre-training and prompt
learning framework for such non-homophilic graphs. First, we analyze existing
graph pre-training methods, providing theoretical insights into the choice of
pre-training tasks. Second, recognizing that each node exhibits unique
non-homophilic characteristics, we propose a conditional network to
characterize the node-specific patterns in downstream tasks. Finally, we
thoroughly evaluate and analyze ProNoG through extensive experiments on ten
public datasets.
comment: Accepted by KDD 2025
♻ ☆ Data-Driven Machine Learning Approaches for Predicting In-Hospital Sepsis Mortality
Arseniy Shumilov, Yueting Zhu, Negin Ashrafi, Armin Abdollahi, Greg Placencia, Kamiar Alaei, Maryam Pishgar
Sepsis is a severe condition responsible for many deaths in the United States
and worldwide, making accurate prediction of outcomes crucial for timely and
effective treatment. Previous studies employing machine learning faced
limitations in feature selection and model interpretability, reducing their
clinical applicability. This research aimed to develop an interpretable and
accurate machine learning model to predict in-hospital sepsis mortality,
addressing these gaps. Using ICU patient records from the MIMIC-III database,
we extracted relevant data through a combination of literature review, clinical
input refinement, and Random Forest-based feature selection, identifying the
top 35 features. Data preprocessing included cleaning, imputation,
standardization, and applying the Synthetic Minority Over-sampling Technique
(SMOTE) to address class imbalance, resulting in a dataset of 4,683 patients
with 17,429 admissions. Five models-Random Forest, Gradient Boosting, Logistic
Regression, Support Vector Machine, and K-Nearest Neighbor-were developed and
evaluated. The Random Forest model demonstrated the best performance, achieving
an accuracy of 0.90, AUROC of 0.97, precision of 0.93, recall of 0.91, and
F1-score of 0.92. These findings underscore the potential of data-driven
machine learning approaches to improve critical care, offering clinicians a
powerful tool for predicting in-hospital sepsis mortality and enhancing patient
outcomes.
♻ ☆ BiasJailbreak:Analyzing Ethical Biases and Jailbreak Vulnerabilities in Large Language Models
Although large language models (LLMs) demonstrate impressive proficiency in
various tasks, they present potential safety risks, such as `jailbreaks', where
malicious inputs can coerce LLMs into generating harmful content bypassing
safety alignments. In this paper, we delve into the ethical biases in LLMs and
examine how those biases could be exploited for jailbreaks. Notably, these
biases result in a jailbreaking success rate in GPT-4o models that differs by
20\% between non-binary and cisgender keywords and by 16\% between white and
black keywords, even when the other parts of the prompts are identical. We
introduce the concept of BiasJailbreak, highlighting the inherent risks posed
by these safety-induced biases. BiasJailbreak generates biased keywords
automatically by asking the target LLM itself, and utilizes the keywords to
generate harmful output. Additionally, we propose an efficient defense method
BiasDefense, which prevents jailbreak attempts by injecting defense prompts
prior to generation. BiasDefense stands as an appealing alternative to Guard
Models, such as Llama-Guard, that require additional inference cost after text
generation. Our findings emphasize that ethical biases in LLMs can actually
lead to generating unsafe output, and suggest a method to make the LLMs more
secure and unbiased. To enable further research and improvements, we
open-source our code and artifacts of BiasJailbreak, providing the community
with tools to better understand and mitigate safety-induced biases in LLMs.
♻ ☆ Conformalized Interval Arithmetic with Symmetric Calibration
Uncertainty quantification is essential in decision-making, especially when
joint distributions of random variables are involved. While conformal
prediction provides distribution-free prediction sets with valid coverage
guarantees, it traditionally focuses on single predictions. This paper
introduces novel conformal prediction methods for estimating the sum or average
of unknown labels over specific index sets. We develop conformal prediction
intervals for single target to the prediction interval for sum of multiple
targets. Under permutation invariant assumptions, we prove the validity of our
proposed method. We also apply our algorithms on class average estimation and
path cost prediction tasks, and we show that our method outperforms existing
conformalized approaches as well as non-conformal approaches.
♻ ☆ Conformal Thresholded Intervals for Efficient Regression
This paper introduces Conformal Thresholded Intervals (CTI), a novel
conformal regression method that aims to produce the smallest possible
prediction set with guaranteed coverage. Unlike existing methods that rely on
nested conformal frameworks and full conditional distribution estimation, CTI
estimates the conditional probability density for a new response to fall into
each interquantile interval using off-the-shelf multi-output quantile
regression. By leveraging the inverse relationship between interval length and
probability density, CTI constructs prediction sets by thresholding the
estimated conditional interquantile intervals based on their length. The
optimal threshold is determined using a calibration set to ensure marginal
coverage, effectively balancing the trade-off between prediction set size and
coverage. CTI's approach is computationally efficient and avoids the complexity
of estimating the full conditional distribution. The method is theoretically
grounded, with provable guarantees for marginal coverage and achieving the
smallest prediction size given by Neyman-Pearson . Extensive experimental
results demonstrate that CTI achieves superior performance compared to
state-of-the-art conformal regression methods across various datasets,
consistently producing smaller prediction sets while maintaining the desired
coverage level. The proposed method offers a simple yet effective solution for
reliable uncertainty quantification in regression tasks, making it an
attractive choice for practitioners seeking accurate and efficient conformal
prediction.
♻ ☆ Causal Deep Learning
We derive a set of causal deep neural networks whose architectures are a
consequence of tensor (multilinear) factor analysis, a framework that
facilitates forward and inverse causal inference. Forward causal questions are
addressed with a neural architecture composed of causal capsules and a tensor
transformer. Causal capsules compute a set of invariant causal factor
representations, whose interactions are governed by a tensor transformation.
Inverse causal questions are addressed with a neural network that implements
the multilinear projection algorithm. The architecture reverses the order of
the operations of a forward neural network and estimates the causes of effects.
As an alternative to aggressive bottleneck dimension reduction or regularized
regression that may camouflage an inherently underdetermined inverse problem,
we prescribe modeling different aspects of the mechanism of data formation with
piecewise tensor models whose multilinear projections produce multiple
candidate solutions. Our forward and inverse questions may be addressed with
shallow architectures, but for computationally scalable solutions, we derive a
set of deep neural networks by taking advantage of block algebra. An
interleaved kernel hierarchy results in a doubly non-linear tensor factor
models. The causal neural networks that are a consequence of tensor factor
analysis are data agnostic, but are illustrated with facial images. Sequential,
parallel and asynchronous parallel computation strategies are described.
♻ ☆ Aligning the Objective of LLM-based Program Repair ICSE'25
Large language models (LLMs) have achieved decent results on automated
program repair (APR). However, the next token prediction training objective of
decoder-only LLMs (e.g., GPT-4) is misaligned with the masked span prediction
objective of current infilling-style methods, which impedes LLMs from fully
leveraging pre-trained knowledge for program repair. In addition, while some
LLMs can locate and repair bugs in certain functions using the related
artifacts (e.g., test cases), existing methods still depend on statement-level
fault localization methods to provide a list of buggy hunks for repair. This
restriction hinders LLMs from exploring potential patches beyond the given
locations.
In this paper, we investigate a new approach to adapt LLMs to program repair.
Our core insight is that LLM's APR capability can be greatly improved by simply
aligning the output to their training objective and allowing them to refine the
whole program without first identifying faulty statements. Based on this
insight, we designed D4C, a straightforward prompting framework for APR. D4C
can repair 180 bugs correctly in Defects4J, with each patch being sampled only
10 times. This surpasses the SOTA APR methods with perfect fault localization
by 10% and reduces the patch sampling number by 90%. Our findings reveal that
(1) objective alignment is crucial for fully exploiting LLM's pre-trained
capability, and (2) replacing the traditional localize-buggy-hunks-then-repair
workflow with direct debugging is more effective for LLM-based APR methods.
Thus, we believe this paper introduces a new mindset for harnessing LLMs in
APR.
comment: Accepted by ICSE'25
♻ ☆ TeLU Activation Function for Fast and Stable Deep Learning
We propose the Hyperbolic Tangent Exponential Linear Unit (TeLU), a neural
network hidden activation function defined as TeLU(x)=xtanh(exp(x)). TeLU's
design is grounded in the core principles of key activation functions,
achieving strong convergence by closely approximating the identity function in
its active region while effectively mitigating the vanishing gradient problem
in its saturating region. Its simple formulation enhances computational
efficiency, leading to improvements in scalability and convergence speed.
Unlike many modern activation functions, TeLU seamlessly combines the
simplicity and effectiveness of ReLU with the smoothness and analytic
properties essential for learning stability in deep neural networks. TeLU's
ability to mimic the behavior and optimal hyperparameter settings of ReLU,
while introducing the benefits of smoothness and curvature, makes it an ideal
drop-in replacement. Its analytic nature positions TeLU as a powerful universal
approximator, enhancing both robustness and generalization across a multitude
of experiments. We rigorously validate these claims through theoretical
analysis and experimental validation, demonstrating TeLU's performance across
challenging benchmarks; including ResNet18 on ImageNet, Dynamic-Pooling
Transformers on Text8, and Recurrent Neural Networks (RNNs) on the Penn
TreeBank dataset. These results highlight TeLU's potential to set a new
standard in activation functions, driving more efficient and stable learning in
deep neural networks, thereby accelerating scientific discoveries across
various fields.
comment: Updated version of "Stable and Robust Deep Learning By Hyperbolic
Tangent Exponential Linear Unit (TeLU)"
♻ ☆ Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach
This paper studies an integrated sensing and communications (ISAC) system for
low-altitude economy (LAE), where a ground base station (GBS) provides
communication and navigation services for authorized unmanned aerial vehicles
(UAVs), while sensing the low-altitude airspace to monitor the unauthorized
mobile target. The expected communication sum-rate over a given flight period
is maximized by jointly optimizing the beamforming at the GBS and UAVs'
trajectories, subject to the constraints on the average signal-to-noise ratio
requirement for sensing, the flight mission and collision avoidance of UAVs, as
well as the maximum transmit power at the GBS. Typically, this is a sequential
decision-making problem with the given flight mission. Thus, we transform it to
a specific Markov decision process (MDP) model called episode task. Based on
this modeling, we propose a novel LAE-oriented ISAC scheme, referred to as Deep
LAE-ISAC (DeepLSC), by leveraging the deep reinforcement learning (DRL)
technique. In DeepLSC, a reward function and a new action selection policy
termed constrained noise-exploration policy are judiciously designed to fulfill
various constraints. To enable efficient learning in episode tasks, we develop
a hierarchical experience replay mechanism, where the gist is to employ all
experiences generated within each episode to jointly train the neural network.
Besides, to enhance the convergence speed of DeepLSC, a symmetric experience
augmentation mechanism, which simultaneously permutes the indexes of all
variables to enrich available experience sets, is proposed. Simulation results
demonstrate that compared with benchmarks, DeepLSC yields a higher sum-rate
while meeting the preset constraints, achieves faster convergence, and is more
robust against different settings.
comment: submitted for an IEEE publication
♻ ☆ Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models
We analyze the training of a two-layer autoencoder used to parameterize a
flow-based generative model for sampling from a high-dimensional Gaussian
mixture. Previous work shows that the phase where the relative probability
between the modes is learned disappears as the dimension goes to infinity
without an appropriate time schedule. We introduce a time dilation that solves
this problem. This enables us to characterize the learned velocity field,
finding a first phase where the probability of each mode is learned and a
second phase where the variance of each mode is learned. We find that the
autoencoder representing the velocity field learns to simplify by estimating
only the parameters relevant to each phase. Turning to real data, we propose a
method that, for a given feature, finds intervals of time where training
improves accuracy the most on that feature. Since practitioners take a uniform
distribution over training times, our method enables more efficient training.
We provide preliminary experiments validating this approach.